A Two-Level Computer Formalism for the Analysis of Bantu Morphology An Application to Swahili
نویسنده
چکیده
SWATWOL is a computer program which has been designed to analyze morphologically Standard Swahili texts. It is based on Koskenniemi's (1983) already well-known two-level model. A number of applications of this model on various languages exist already. Some of those are very ambitious and almost complete (Koskenniemi 1983; Karlsson 1992), others being still in various stages of development. The implementations have concerned so far such languages, where inflection and derivation is taken care of primarily by means of suffixation. This is the first effort to apply the two-level model to a primarily prefixing language. SWATWOL includes a full description of Swahili inflectional morphology and morphophonology and contains a lexicon system with more than 25 000 lexical entries. The program identifies and analyzes all correct word-forms as far as the lexical items are listed in the lexicon. The extensive tests on various types of Swahili texts described below indicate that the coverage and precision of the program are close to perfection.
منابع مشابه
Disambiguation of morphological analysis in Bantu languages
The paper describes problems in disambiguating the morphological analysis of Bantu languages by using Swahili as a test language. The main factors of ambiguity in this language group can be traced to the noun class structure on one hand and to the bi-directional word-formation on the other. In analyzing word-forms, the system applied utilizes SWATWOL, a morphological parsing program based on tw...
متن کاملMorphological Parsing of Tone: An Experiment with Two-Level Morphology on the Ha Language
Morphological parsers are typically developed for languages without contrastive tonal systems. Ha, a typical Bantu language of Western Tanzania, proposes a challenge to these parses with both lexical and grammatical pitch-accent that would, in order to describe the tonal phenomena, seem to require an approach with a separate level for the tones. However, since the Two-Level Morphology (Koskenni...
متن کاملDevelopments of Swahili resources for an automatic speech recognition system
This article describes our efforts to provide ASR resources for Swahili, a Bantu language spoken in a wide area of East Africa. We start with an introduction on the language situation, both at linguistic and digital level. Then, we report the selected strategies to develop a text corpus, a pronunciation dictionary and a speech corpus for this under-resourced language. We explore methodologies a...
متن کاملOptimizing disambiguation in Swahili
It is argued in this paper that an optimal solution to disambiguation is a combination of linguistically motivated rules and resolution based on probability or heuristic rules. By disambiguation is here meant ambiguity resolution on all levels of language analysis, including morphology and semantics. The discussion is based on Swahili, for which a comprehensive analysis system has been develope...
متن کاملUsing Syllables as Features in Morpheme Tagging in Swahili
Utilizing corpora to build morphological analyzers for the purposes of computational application has been addressed in many different ways. Methods for automated morphological analysis generally focus on segmentation from raw text, and ignore the actual learning of what morpheme features are present. Other methods are time-consuming and require a great deal of prior knowledge of the language su...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005